110 PART 3 Getting Down and Dirty with Data
»
» Spot-checking data entry: If doing data entry from forms or printed material,
choose a percentage to double-check (for example, 10 percent of the forms
you entered). This can help you tell if there are any systematic data entry
errors or missing data.
Creating a File that Describes
Your Data File
Every research database, large or small, simple or complicated, should include a
data dictionary that describes the variables contained in the database. It is a neces-
sary part of study documentation that needs to be accessible to the research team.
A data dictionary is usually set up as a table (often in Excel), where each row pro-
vides documentation for each variable in the database. For each variable, the dic-
tionary should contain the following information (sometimes referred to as
metadata, which means “data about data”):»
» A variable name (usually no more than ten characters) that’s used when
telling the software what variables you want it to use in an analysis»
» A longer verbal description of the variable in a human-readable format (in
other words, a person reading this description should be able to understand
the content of the variable)»
» The type of data (text, categorical, numerical, date/time, and so on)
• If numeric: Information about how that number is displayed (how many
digits are before and after the decimal point)
• If date/time: How it’s formatted (for example, 12/25/13 10:50pm or
25Dec2013 22:50)
• If categorical: What codes and descriptors exist for each level of the
category (these are often called picklists, and can be documented on a
separate tab in an Excel data dictionary)»
» How missing values are represented in the database (99, 999, “NA,”
and so on)
Database programs like SQL and statistical programs like SAS often have a func-
tion that can output information like this about a data set, but it still needs to be
curated by a human. It may be helpful to start your data dictionary with such out-
put, but it is best to complete it in Excel. That way, you can add the human cura-
tion yourself to the Excel data dictionary, and other research team members can
easily access the data dictionary to better understand the variables in the database.